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Abstract. We present a full description of the N-probability density function of the galaxy 
number density fluctuations. This N-pdf is given in terms, on the one hand, of the cold dark 
matter correlations and, on the other hand, of the galaxy bias parameter. The method relies 
on the assumption commonly adopted that the dark matter density fluctuations follow a local 
non-linear transformation of the initial energy density perturbations. The N-pdf of the galaxy 
number density fluctuations allows for an optimal estimation of the bias parameter (e.g., via 
maximum-likelihood estimation, or Bayesian inference if there exists any a priori informa¬ 
tion on the bias parameter), and of those parameters defining the dark matter correlations, 
in particular its amplitude (os). It also provides the proper framework to perform model 
selection between two competitive hypotheses. The parameters estimation capabilities of the 
N-pdf are proved by SDSS-like simulations (both ideal log-normal simulations and mocks 
obtained from Las Damas simulations), showing that our estimator is unbiased. We apply 
our formalism to the 7th release of the SDSS main sample (for a volume-limited subset with 
absolute magnitudes M r < —20). We obtain b = 1.193 ± 0.074 and cfg = 0.862 ± 0.080, for 
galaxy number density fluctuations in cells of a size of 30h _1 Mpc. Different model selection 
criteria show that galaxy biasing is clearly favoured. 
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1 Introduction 

Observations of the distribution of the galaxies [1, 2] and state-of-the-art N-body simulations 
[3, 4] show that the large scale structure (LSS) of the universe forms a network of filaments, 
clusters and voids, mostly defined by the cold dark matter fluctuations. Galaxies trace the 
dark matter density held through the bias parameter, which links the evolution of the matter 
gravitational potential and the galaxy formation and distribution. Actually, determining the 
galaxy bias is not only useful to trace the dark matter structure, but also helps to understand 
the process of galaxy formation and distribution [5]. 

Depending on the galaxy type that is under consideration, different aspects of the LSS 
network are probed. For instance, Luminous Red Galaxies (LRGs) will be associated with 
dark matter halos, with the most luminous ones corresponding to large clusters (typically 
formed at the crossing of several filaments); spiral galaxies will be, however, better tracers 
for the filaments of the network, where they appear in a larger proportion; QSOs will serve 
as tracers of the distant universe, where galaxies are still in the phase of accretion of matter 
to the inner black hole; etc. 

LSS surveys as NVSS [6], 2MASS [7], SDSS [1] or VIPERS [8] have provided very useful 
information (in a large wavelength range) about galaxy bias: e.g. [9] for the NVSS, e.g. [10] 
for the 2MASS, e.g. [11] for the SDSS and e.g. [12] for VIPERS. In most of the literature, the 
bias has been estimated using the 3D-correlation function of galaxies [13-15], counts-in-cells 
statistics [2, 16-18], the projected correlation function [11, 19-21], the bi-spectrum [22, 23] 
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and higher order moments of the galaxy distribution [5, 24-26]. The bias can also be inferred 
estimating the 1-point distribution function (pdf) from counts in cells, assuming a model for 
the mass pdf and measuring the galaxy over-density, see e.g., [12, 16] and references therein. 
Related, though not identical reasoning, has been used in this paper. In particular, as pointed 
out by Kitaura, Jasche and Metcalf in [27], it is possible to make a Bayesian matter density 
field reconstruction assuming a log-normal prior and modelling the galaxy distribution by a 
Poissonian process. The log-normal model is also adopted as a prior for the density field in 
inference models for large-scale structure as the one introduced by Jasche and Kitaura [28] 
by means of the Hamiltonian Monte Carlo (HMC) algorithm. Other approaches to determine 
the galaxy bias are usually based on the use of multivariate probability distributions, typically 
Gaussian or lognormal distributions, which are well suited priors for Bayesian analyses (see 
e.g. [29-32] and references therein). 

Our method relies on the use of the whole set of N-pdf of the galaxy number density 
fluctuations. This multivariate probability density function depends on the bias parameter 
and the correlations of the underlying cold dark matter fluctuations. When this N-pdf is 
seen as a function of the bias parameter, it represents, in fact, the likelihood of the data 
(i.e., the galaxy catalogue) given the galaxy bias parameter. Therefore, the N-pdf provides 
a full description of the statistical properties of the galaxy number density field, and allows 
one to derive interesting bias estimators as the maximum-likelihood bias, the mean bias, 
etc. In addition, it also gives the opportunity of performing model selection among different 
galaxy biasing scenarios. Finally, our approach provides a coherent scheme for introducing 
any available information on the bias parameter in the form of prior probabilities, following 
the Bayes theorem. 

The paper is organized as follows. In section 2 the model is presented, and it illustrates 
how to derive the N-pdf of the galaxy number density fluctuations (section 2.1), as well as how 
to perform parameter estimation (section 2.2) and model selection (section 2.3). In section 3 
we present the real and simulated data used in this work: the real galaxy catalogues from 
the SDSS (section 3.1), a set of lognormal simulations following our model (section 3.2), and 
a set of mocks obtained from N-body cosmological simulations (section 3.3). We test the 
performance of our approach by analysing both sets of simulations in section 4. The results 
obtained from the application of the method to SDSS data are given in section 5. Finally, we 
give our conclusions in section 6. 

Except when noted otherwise, we use a fiducial flat ACDM cosmological model with 
parameters = 0.308, = 0.692, as = 0.8149 based on the Planck-2015 results [33]. 

All the distances used are comoving and are given in terms of the Hubble parameter h = 
-Ho/100 kms _1 Mpc _1 . 


2 The galaxy number density model 

As outlined in the Introduction, we aim to determine the N-pdf of the galaxy number density 
field, given the correlation properties of the dark matter density field, as well as the bias 
parameter b, that relates the galaxy formation with the fluctuations of the dark matter density 
field. 

The description of the galaxy density field via its N-pdf provides a powerful framework to 
derive detailed statistical analyses. In particular, the N-pdf allows for a maximum-likelihood 
estimation (or Bayesian estimation, in the case of introducing any possible prior knowledge) 
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of the parameters describing the galaxy distribution. In addition, model selection approaches 
can be also applied. 

These two approaches have been applied by [34, 35] to the N-pdf of a local Gaussian 
deviation of the cosmic microwave background temperature fluctuations. This deviation was 
given in terms of a non-linear perturbative expansion of a Gaussian held. In the Sachs-Wolfe 
regime, and under certain conditions, this perturbative model defaults on the weak non-linear 
coupling inhationary model [e.g., 36]. 

2.1 Distribution of the galaxy number density fluctuations 

Let us denote by po.ii the initial energy density at a given position i, and by po,b the mean 
initial energy density or the background initial energy density. Then, the initial energy density 
fluctuation at position i is nothing but: 


* Po,i - P0,b 

00,1 — , 

(2.1) 

P0,b 


and the initial energy density contrast is defined as: 


A -i | <r P0,i 

^0,2 — 1 + — -. 

(2.2) 

P0,b 


Trivially, (<5o,i) = 0 and (Ao i) = 1. As predicted by the standard inflation scenario [e.g. 37], 
the probability density function of the initial energy density fluctuations follows a multivariate 
normal distribution: 

f (<5o,i) = (27 t) 7V// ^ L |C 0 1 1/2 exp ’ ( 2 - 3 ) 

where N represents the number of positions (assuming a given discretization or pixelization 
of space) in which the field is realized, and the Co matrix provides the Gaussian field cor¬ 
relations (i.e., (<i'o,?Aj) = Co,ij)- Hereinafter, addition over repeated indices is assumed, i.e., 

e /O — 1 r v—'../V c e /^i —1 

00,j — 2^i=l l^j= 1 °0,i00,j'-'0ij • 

N-body simulations have been showing, since the mid 80s, that the dark matter density 
field shows a non-Gaussian behaviour, even when the seeds of the initial energy density 
perturbations are Gaussian [e.g., 38-41], This non-Gaussianity is a reflection of the non¬ 
linear nature of the gravitational instability. This hint provided by N-body simulations is 
also confirmed by large-scale surveys, that also demonstrate that the galaxy number density 
follows a non-Gaussian distribution. Among all the local non-linear transformations, the 
log-normal model [e.g., 42, 43] provides an excellent description for the galaxy distribution, 
at least for the lowest non-linear orders. This model also provides an excellent framework 
to derive analytical expressions, and it is the one adopted in this work. It is convenient to 
express the matter density contrast Aas a function of the initial energy density fluctuations 
through the following transformation: 

A mi = exp (a^- - (2.4) 

V °Ao 2 J 

where a is a constant and o\ a = Co,u = Conversely, the local inverse transformation 

is given by: 

$o,i = ^log A m;i + y) • ( 2 -5) 
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Attending to these definitions, it is straightforward to show that: 


a 2 = log (l + 4 m ) 

(A m ,i) = 1 


( 2 . 6 ) 


(A mj jA TO) j) — exp 


a 


v ® Ao 

Hq } ij — log (1 “I - C m ,ij ) • 
, 2 


Co, 


V 


where v\ m = C m} a, and i7o,ij = Cq,^. Given equations (2.3) and (2.5), it is trivial 


to compute the N-pdf associated with the cold dark matter density field: 

1 


ex P (-IviHoij'yj) 

f (A?n,i) = (2v t) n ' 2 | Ho | 1/2 


7=1 Am J 


where we have defined the auxiliary parameters y* as 

log ( 


Vi 


A r 


1 + 07 


(2.7) 


( 2 . 8 ) 


The galaxy number density fluctuations ( 5 g j j) are assumed to be related to the cold dark 
matter density through a local transformation in terms [44, 45] of the galaxy bias b: 


3g,i — bd m ,i 


A m,i — 


A g : i + 5—1 


(2.9) 


where A g ^ is the galaxy number density contrast. Given this relation, and taking into account 
equation (2.7), it is straightforward to compute the N-pdf associated with the galaxy number 
density field: 


1 exp (-hy^/y,- 

H 9 ' i} (2ir) N / 2 H 0 1 1//2 HjLi (Agj + 5 — 1) ’ 

where we can use equations (2.8) and (2.9) to express y* in terms of the galaxy number density 
fluctuations {5 g ^) and their correlations (C g = 6 2 C m ): 


( 2 . 10 ) 


Vi = log 


b 2 + al 


b 2 


(A g t i +6—1) 


( 2 . 11 ) 


where a\ g = C g ^ = 6 2 C' miii . 

2.2 Parameter estimation 

In the context of parameter estimation, the N-pdf given by equation (2.10) can be seen as 
the likelihood function of observations (the values of the galaxy number density contrast A g ^ 
at N positions in space) given a galaxy clustering model. This clustering model has two 
parts: on one hand the galaxy bias parameter 6, and on the other the covariance of the 
matter fluctuations C m fixed by the cosmological model. These covariances are defined as 
C m ,ij = {5m,i5 m ,j) j and are therefore equal to 

C m ,ij = imirij), (2.12) 
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where r'ij is the comoving distance between the centers of the cells i, j. and £ m (r ) is the matter 
correlation function predicted by the cosmological model, filtered to account for the finite size 
of the cells. In our case, given a model we obtain the matter power spectrum P m (k) using the 
CAMB 1 software [46], and calculate the corresponding £ m (r) via a Fourier transform using 
FFTLog [47], 

In the simplest case we can assume that the cosmological model is known, fixing C m , 
so the only free parameter is the bias b. In this case, we can express the likelihood as 

L (A g ,i|6) = / (A Sj i). (2-13) 

The advantages of exploiting the N-pdf for statistical analyses, as parameter estimation or 
model selection, are clear. In particular, given the A g i observed by a galaxy survey and the 
C m from the assumed cosmological model, we can obtain the maximum-likelihood estimate 
of galaxy the bias b, as well as alternative estimates like the mean value. In addition, if 
any a priori information on the bias parameter p ( b ) is available, it can be used together 
with equation (2.10) to provide the posterior probability of the bias parameter given the 
observations p(b\A gt i), following the Bayes’ theorem: 

p (b\Ag :i ) (x L (Ag :i \b) p (b) . (2.14) 

This posterior probability allows performing a full Bayesian parameter estimation, as well as 
Bayesian model selection (e.g., Bayesian evidence). 

In addition to the bias, we can also use this likelihood approach to constrain the param¬ 
eters of the cosmological model via the covariance matrix for matter fluctuations, C m . In 
particular, we focus here on constraining the amplitude of the matter power spectrum, jointly 
with the galaxy bias b. We parameterize this amplitude using the standard parameter cr 8 . 2 
In order to introduce this additional parameter in equation (2.10), we first assume a fiducial 
value crf d and compute the corresponding covariance matrix C m fid . The covariance matrix 
then depends on as as 



Introducing this expression for C m into equation (2.10), we can then interpret /(A 9) j) as the 
likelihood of the data A Si $ given the two parameters ( b , <r 8 ), 

L{Ag ti \b,as) = f (Ag ti ) . (2.16) 

In the same way as in the single-parameter case, this likelihood function can be used to obtain 
the joint maximum likelihood estimates for the two parameters b , ds or other estimates. We 
can also combine this likelihood with any prior on the parameters to obtain the joint posterior 
probability distribution, 


P (b, cr 8 | Ag t i) oc L (A Si i|6, cr 8 ) p (6, cr 8 ). (2.17) 

In this work we always use flat wide priors either on b (for equation 2.14) or on (6, <r 8 ) 
(for equation 2.17), so these relations are just a normalization of the posterior distributions. 

1 http://camb.info 

2 <jg corresponds to the matter density variance in spheres of radius R = 8/i -1 Mpc, when using a linear 
model extrapolated to z = 0. 
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However, these prior probabilities could be used, e.g., to add information coming from inde¬ 
pendent observations constraining ag¬ 
in principle, these two parameters could be degenerate, as they are both related to the 
overall galaxy clustering amplitude. This is the case in the analyses of galaxy bias based 
on two-point statistics, where the estimates of b are completely degenerate with ag, and in 
fact one can only constrain the quantity bag. When using the N-pdf, however, we also have 
information on the shape of the distribution that breaks this degeneracy. Our model predicts 
a multivariate log-normal N-pdf for the matter density contrast A m j, (eq. 2.7), but a different 
distribution for the galaxy density contrast Aj (eq. 2.10) when b ^ 1. Therefore, variations 
in ag change the overall variance of the distribution, while keeping the log-normal shape, while 
variations in b change both the overall variance and the shape of the distribution, breaking 
the degeneracy. 

One could also use this approach to constrain other cosmological parameters that affect 
the model covariance matrix C m , such as fi m . However, these parameters affect as well the 
estimation of the galaxy density Afrom the data, via the conversion from galaxy redshifts 
to distances. This means that the simple likelihood interpretation presented here is not valid 
in this case. For this reason, we keep all cosmological parameters except <7s fixed in our 
analysis. In section 4.2 we test the reliability of the method to this assumption. 


2.2.1 Estimation of the uncertainty on the parameters 

As described above, we can use the N-pdf of the galaxy density field to derive the maximum- 
likelihood estimation of the parameters b , ag. We estimate the uncertainty on these parame¬ 
ters using the Fisher matrix (F) formalism. 

In the case in which we keep <78 fixed and only fit for the bias b, the uncertainty a- b on 
our estimate b is given by 


a l = F b -1 ; n = 


r <9 2 4 

db 2 


(2.18) 


J b=b 


where 4 = logL (A ffi j| b). In the case in which we estimate both b and <78 from our method, 
the covariance matrix ^ of the estimated parameters b, dg is given by 



= F 


b,<r 8 



' dH 

d 2 £ 


db 2 

dbda 

’ 1 b,cr 8 

d' 2 l 

d 2 l 


dagdb 

da 2 


(b,crg)=(b,<7g) 


(2.19) 


where in this case t = logL (A g ^\b, ag). The diagonal terms of this matrix correspond to the 
variances of the estimated parameters 


— V Ci \; <7 CT - g — \fC~2 2 , 


( 2 . 20 ) 


while the diagonal term corresponds to the covariance between the two parameters. 


2.3 Model selection 

Even if a priori information on b is lacking, model selection can also be performed following 
a likelihood-based approach. In the context of the N-pdf of the galaxy density field, it is 
interesting to compare our full two-parameter model (defined by equations 2.15 and 2.16) to 
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an alternative no bias scenario (i.e., 6=1) in which we only fit for as- In this work, we consider 
the Akaike information criterion [AIC, 48], the Bayesian information criterion [BIC, 49], and 
the generalized likelihood ratio test (GLRT). These criteria have been already applied (and 
explained in detail) in the context of a N-pdf derived for a non-Gaussian model describing 
a local transformation [34], that could represent the weak non-linear coupling inflationary 
model [36], in the Sachs-Wolfe regime. Given a certain hypothesis H a , defined both by the 
maximum-likelihood parameters 9 a 3 and a maximum log-likelihood value £ a = logL (A g p\6 a ), 
the AIC and BIC are given by: 

AIC (H a ) = 2(N p - 4) (2.21) 

BIC (H a ) = 2 log N - (2.22) 

where N p is the number of free parameters in the model (in our case, either N p = 1 or 
N p = 2), and N is the number of data points (in our case, the number of grid points in which 
we computed A s ). Therefore, for instance, a given hypothesis H a is said to be favoured by 
the AIC with respect to an alternative Hp, if AIC ( H a ) < AIC (Hp). The same is applied 
to the BIC model selection procedure. Regarding the GLRT approach, H a is said to be 
favoured over Hp, at a u level (with u > 0), if log > v - 

3 Data 

In this section, we present the different data sets used in this work. In section 3.1 we explain 
the selection of our sample from the SDSS, and how we estimate the galaxy density held 
from it. Section 3.2 describes how we generated a set of log-normal simulations to test the 
consistency of our method. Finally, in section 3.3 we present the Las Damas set of mocks 
which we used to test the method in realistic simulations. 

3.1 SDSS Main galaxy sample 

We use the method described in section 2 to study the galaxy bias and matter power spec¬ 
trum amplitude for a galaxy sample drawn from the 7th data release [50] of the SDSS main 
catalogue. We used the data provided by the New York University Value Added Catalogue 
[NYU-VAGC, 51]. In order to avoid problems due to the radial selection function, we selected 
a volume limited sample with M r < —20 in the redshift range 0.033 < z < 0.106, where M r 
is the K + E corrected r-band absolute magnitude. We chose this redshift range to ensure 
that the sample is volume limited, and also to use a sample consistent with the correlation 
function analysis of [11] and the Las Damas mocks (see section 3.3). We convert the measured 
angles and redshifts into co-moving coordinates using our fiducial cosmological model based 
on the Planck-2015 results. 

In order to study the N-pdf as described in section 2, we need to estimate from this 
galaxy sample the galaxy overdensity held A g . We chose to study this held using a grid of 
cubic cells covering the volume of the survey. In order to choose the physical size of the cells, 
we studied the variance between cells obtained at different resolutions. Comparing to the 
variance obtained for Poisson catalogues of the same volume and density, we can quantify the 
relative effect of shot noise. Using this approach, we decided to adopt cubic cells with a side 

3 in our case, depending of the model, 6 = b or 6 = (b, erg) 
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of 30 h~ l Mpc, for which the observed variance is ~ 7 times larger than the Poisson noise, so 
we can assume that we are in the signal-dominated regime. 

Once our grid of cells is defined, we proceed as follows to estimate A g . In the first 
place, we compute the completeness for each of the cells (c,) as the combination of two 
components: the radial and the angular selection functions. For the former we assume a 
constant selection in the redshift range mentioned above, and for the latter we use the angular 
mask from the NYU-VAGC in the MANGLE software format [52, 53]. To simplify the selection 
function, we consider only the North Cap of the SDSS area. We use in our analysis only cells 
with a completeness Cj > 0.8. This leaves us with N = 582 valid cells, with a volume of 
V = 15.7 x 10 6 h -3 Mpc 3 , embedded in a box of 240 x 480 x 240 h~ 5 Mpc 3 . The total number 
of galaxies in the selected cells is N g = 90, 634. 

We obtain the number of cells n* in each of these accepted cells, and estimate the galaxy 
number density for each cell as 



(3.1) 


where V c = (30 h~ l Mpc) 3 is the volume of a cell. We finally compute the galaxy density 
contrast A gg , normalising p g j by the mean galaxy density (eq. 2.2). In Figure 1 we show a 
3D projection of this galaxy density field obtained from the SDSS data. The galaxy density 
contrast held A g i in the N cells estimated in this way is the quantity whose pdf is given by 
equation (2.10). As we already take into account the selection function here to define the 
‘valid’ cells, and to estimate the density, the method described in section 2 can be applied 
directly. We only have to take into account the positions of the selected cells to calculate the 
model covariance matrix C m . 


3.2 Lognormal simulations 

In order to test the method, we created a set of 100 lognormal realizations of the matter 
density held. These simulations are created following the same model described in section 2, 
and with the same survey geometry as the SDSS data (section 3.1), so they can be used to 
assess the consistency of our method. The input matter power spectrum P m (k ) for these 
simulations corresponds to the best-ht model to the temperature + lensing Planck-2015 data 
[33, table 4, column 2], computed using the CAMB software. 

We generate the lognormal simulations using the method described in [54] (which is 
equivalent to the one in [55]), which is based on the fact that a lognormal random held is a 
local transformation of a Gaussian held. We compute the matter correlation function £ m (r) 
via a Fourier transform of P m (k). We obtain the correlation function for the initial Gaussian 
held £ 0 (r) using the transformation in equation (2.6), as 

= (~) l°g(! + &n(r))> (3.2) 

and the corresponding power spectrum Po(k) as its inverse Fourier transform. We then 
generate the Gaussian random held 5q in a cubic grid using the standard method of generating 
Gaussian Fourier modes with variances given by Po(k) and then performing a Fast Fourier 
Transform (see e.g. section 7.4.1 of [56]). Finally, we obtain the corresponding matter density 
held A m i using the transformation given by equation (2.4). 

To avoid possible boundary problems, we initially generate each of these lognormal 
simulations in a box of (1440 h~ l Mpc) 3 , with cells of side 30 h~ 1 Mpc. We then use the cell 
completeness defined for the SDSS data to define the same geometry with N = 582 valid cells, 
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Figure 1 . 3D projection of the galaxy number density field corresponding to the SDSS catalogue 
used in this work. The colour palette used in the projection is shown at the bottom and corresponds 
to densities 1 < A g < 2 from left to right. 


and we keep only the value of the density field Ain those cells. In this way, we ensure 
that the grid points we use are always at a distance > 480 h^ 1 Mpc from any of the borders 
of the box. 

For each of our 100 simulations of the matter density field Agiven a value for the 
galaxy bias b, we can generate the corresponding galaxy density field A g g using equation (2.9). 
In section 4.1 we will explore the results we obtain for four input values of the bias: b = 
0.5,1.0,1.5, 2.0. In Figure 2 we represent 3D-projections of the galaxy number density field for 
these bias values obtained from one of our simulations (therefore the underlying realization of 
matter fluctuations is the same for the 4 cases shown). It can be seen that, as it is expected, 
the clustering increases when the bias parameter is getting larger. We can compare this 
projection to that of the real data shown in Figure 1. 

3.3 Las Damas set of mock catalogues 

As a further assessment of the N-pdf method, it is important to test it in more realistic 
simulated catalogues, which reproduce the observed distribution of galaxies. We chose to use 
the set of galaxy mocks obtained from the Las Damas simulations 4 [57, 58], and in particular 
the gamma release. These mock catalogues were created matching both the selection effects 
and the clustering properties of the SDSS-DR7 real catalogues, and are therefore optimal for 

4 http://lss.phy.vanderbilt.edu/lasdamas/ 
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Figure 2. Galaxy number density field, shown as a 3D projection, for one of the realizations of the 
lognormal simulations. The underlying cold dark matter density field is the same for all the cases, 
but the galaxy bias is different in each of the panels. Bias values are, from left to right and top to 
bottom, b = 0.5,1.0,1.5, 2.0. The colour palette used in the projection is the same as in Figure 1. 
Notice how clustering increases as bias grows. 


our purposes. We use the set of mocks corresponding to the galaxy selection M r < —20, to 
match the SDSS galaxy catalogue described in section 3.1. 

The mocks used here were obtained from the Esmeralda dark matter simulation: a set 
of 30 N-body realizations, each containing 1250 3 particles in a volume of (640 h~ l Mpc) 3 . 
These realizations are created using a standard ACDM model with parameters = 0.25, 
= 0.75, <t 8 = 0.8. We use this same cosmological model when analysing the Las Damas 
mocks. 

The simulations are populated by galaxies using the halo occupation distribution (HOD) 
formalism, with the HOD parameters tuned to reproduce the observed number density and 
projected correlation function w p (r p ) (at scales r p £ [0.3, 30] h~ 1 Mpc, as studied in [11]) 
of the corresponding SDSS catalogues. Finally, the mock galaxy catalogues are obtained 
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Figure 3. 3D projection of the galaxy number density field corresponding to one of the Las Damas 
mocks. The colour palette used in the projection is the same as in Figures 1 and 2. Notice the 
similarity of the clustering in this case and that of the real data shown in Figure 1. 


by applying the SDSS angular selection mask from NYU-VAGC and appropriate redshift 
limits. Both selections match those we used to select the real SDSS catalogue presented in 
section 3.1. We use the North-only version of the mocks for which four non-overlapping mocks 
are obtained from each simulation box. Hence, we end up with a total of 120 mock galaxy 
catalogues. 

For each of the mocks, we compute the galaxy density field Ausing the same method 
described in section 3.1 for the SDSS data, including the use of the angular and radial selection 
functions, and keeping only cells with completeness c* > 0.8. For consistency, we use here the 
same cosmological model used to create the simulations (which is slightly different from the 
Planck-2015 model we use elsewhere). This results in a slightly different number of valid cells 
{N = 589) with respect to the SDSS data catalogue. In section 4.2 we use the Las Damas 
mocks to test the reliability of our method with respect to the cosmological model used here. 
In Figure 3 we show a 3D projection of the galaxy density field for one of the Las Damas 
mocks. By comparing to Figure 1 we see that the clustering of the mock seems very similar 
to that of the real data. 

4 Tests of the method 

We tested the N-pdf method to constrain b, ag presented in section 2 using the two sets of 
simulated galaxy density fields presented in sections 3.2 and 3.3. First, in section 3.2 we 
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show the results of applying our method to the lognormal simulations generated using this 
same model for the density held. This is a way to assess the internal consistency of the 
method. Then, in section 4.2, we analyse the Las Damas mock catalogues. In this case, 
we can assess the reliability of the method in a realistic case when the basic assumptions 
(lognormal distribution of matter, linear biasing) are approximate but not exactly fulfilled. 

4.1 Application to lognormal simulations 

We investigated the performance of the N-pdf method presented in section 2 in the ideal case 
represented by the lognormal simulations described in section 3.2. In this way, we test the 
internal consistency of the method and, at the same time, we check whether the maximum- 
likelihood parameter estimator is, in fact, unbiased, and we explore how the estimator sensitiv¬ 
ity depends on the actual bias factor. We apply the method to the 100 simulations, for galaxy 
density fields generated with four different values of the true bias: 6 true = 0.5,1.0,1.5, 2.0. 

First, we applied the method to the simulated fields assuming that the cosmological 
model was fully known, so that C m was kept fixed to the true values of the matter correlation 
function, and we only fit for the galaxy bias b. In Figure 4 we present the histograms of the 
values of b recovered for each realization. Each panel of the figure corresponds to a different 
value of the true bias. In each panel, we list the mean value of our maximum-likelihood 
estimate b, together with its dispersion in the 100 simulations. For each simulation, we also 
estimate the Fisher matrix uncertainty on the bias according to equation (2.18). We also list 
in each panel the mean value and dispersion of the a^ obtained in this way. 

We notice first that, as expected, the bias estimation is unbiased and that, second, the 
error associated to the parameter estimation is very well described by the Fisher matrix. 
Finally, the parameter dispersion increases as the bias increases, but the relative error in the 
bias determination (i.e., ay/b) is almost constant for all the cases: ~ 2.3%. 

As a second step, we perform the fit in both the galaxy bias b and the power spectrum 
amplitude ag, as described by equations (2.15) and (2.16). We show the 2D distribution 
of the maximum-likelihood values (6, as) obtained from the 100 realisations in Figure 5, 
together with the marginal distributions of each of the two parameters. As before, each of 
the panels corresponds to a different true value of the bias. We also obtained in each case 
the parameter covariance matrix and the corresponding uncertainties using the Fisher matrix 
approach described in section 2.2.1. In each panel, we show as a dashed contour the la 
ellipse derived from the mean covariance matrix, and list the mean and dispersion of both 
the parameter estimations and their uncertainties. 

We obtain, again, that the maximum-likelihood estimators of both b and ag are unbiased. 
For the case of the bias, we obtain again that the Fisher matrix estimation of the uncertainty 
matches the observed dispersion of b. As in the first case, this uncertainty scales with the 
bias, although it is significantly larger now, with a^/b ~ 6.7%. This is understandable, as 
now there is additional freedom due to the value of ag being allowed to change. Regarding 
the uncertainty on as, we see that the Fisher matrix approach sligthly over-estimates it (by 
a ~ 9%). Hence, we can take it as a conservative uncertainty estimate. We also observe that 
the error on as does not depend on the (true) value of the bias. 

From the analysis of the lognormal simulations, we can conclude that our method is 
consistent, in the sense that, when its assumptions are fullhlled, it provides unbiased estimates 
of b and erg. Moreover, the Fisher matrix approach provides correct (or slightly conservative) 
estimates of the uncertainty. 
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Figure 4. Distribution of the maximum-likelihood estimates of the bias b, for the 100 lognormal 
simulations, for the case in which the amplitude of the matter power spectrum is fixed. Each of the 
panel corresponds to a different value of the true bias in the simulations. From left to right and top to 
bottom, these values are 6 t rue = 0.5,1.0,1.5, 2.0. Input values are given as vertical red lines, whereas 
the mean b over the simulations are represented by solid vertical green lines. The dashed green lines 
encompass the ler region around the mean. We note in each case the mean b and the mean uncertainty 
a a ~ obtained via the Fisher matrix. 


4.2 Application to Las Damas simulations 

To further test the reliability of the N-pdf method, and its applicability to the real distribution 
of galaxies in the Universe, we applied it to the Las Damas mocks described in section 3.3, 
which provide a test bench closely matching the galaxy distribution in the SDSS catalogue. We 
first applied the method, both fitting only for the bias and for both the bias and the amplitude 
erg, using as fiducial model the true cosmological model used to create the simulations (see 
section 3.3). Regarding the input value of b for the simulations, we know that they were 
built to match the projected correlation function of the real SDSS data. We therefore take 
as the ‘true’ bias the value of b = 1.20 ± 0.01 obtained by [11] for the corresponding SDSS 
catalogue 5 by a HOD fit to w p {r p ). We note that other fitting methods in that same work 
gave compatible values of the bias with significantly larger error. 

The results of the application of the N-pdf method to the Las Damas mocks are shown 
in Figure 6. The left panel shows the distribution of maximum-likelihood estimates of the 
bias, b, when we keep all the cosmological parameters fixed to their true values. In this case, 

5 The analysis in [11] uses the same Las Damas cosmology, so we can use this value directly in our comparison 
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Figure 5. Distribution of the maximum-likelihood estimates of the parameters 6, erg, for the 100 
lognormal simulations, for the case in which we allow both parameters to change. From left to right 
and top to bottom, the panels correspond to different values of the true bias: 6 t rue = 0.5,1.0,1.5, 2.0. 
Blue dots represent the individual estimates for each realization and the histograms at the top and 
right of each panel show the marginal distributions for each parameter. Input values are given as red 
lines. The big green dot and solid lines correspond to the mean of the recovered parameters over the 
simulations. The dashed green lines in the histograms encompass the ler regions around the mean for 
each parameter. The green dashed ellipse represents the joint la region corresponding to the mean 
covariance matrix from the Fisher matrix approach. 


we obtain the estimates of the bias b = 1.2133 ± 0.0277. The uncertainties derived via the 
Fisher matrix are crj^ sh = 0.0271 ± 0.0008, which provide again a very good estimate of the 
true dispersion of the recovered b. The relative error on the estimated bias, v^/b = 2.3% is 
remarkably close to the one obtained for the lognormal simulations in the previous section. 
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Figure 6. (Left) Distribution of the maximum-likelihood estimates of the bias, for the 120 realiza¬ 
tions of the Las Damas mocks, for the case in which as is fixed. The vertical lines have the same 
meaning as in Figure 4. (Right) Distribution of the maximum-likelihood joint estimates of b and as, 
for these same mocks, for the case in which we fit for both parameters. The meaning of the different 
symbols and lines is the same as in Figure 5. 


The recovered bias is compatible, within the errors, with the ‘true’ value. We note that we 
do not expect here a perfect agreement, as the input bias was fixed using a different statistic 
(■ w p (r p )) and range of scales. 

The right panel of Figure 6 shows the 2D distribution of the estimates b, ds when we 
also allow the power spectrum amplitude to vary. In this case, we obtain again results very 
similar to the ones obtained for the lognormal simulations. We obtain a relative error of 
the bias cr^/b = 5.8%, with the true dispersion being well recovered by the Fisher matrix 
estimate. We recover the amplitude of the power spectrum as ds = 0.805 ± 0.069, in perfect 
agreement with the true value of erg = 0.8. The Fisher matrix estimate of the uncertainty, 
a^ sh = 0.074 ± 0.005, slightly overestimates the error on cfg. 

Overall, we obtain that the results for the Las Damas mocks are remarkably similar 
to those obtained for the case of the lognormal simulations and that the N-pdf method 
provides also in this case unbiased estimations of b and cr$. This is an indication that the 
basic assumptions of the method (lognormal distribution of the matter density field and linear 
biasing) are good enough approximations for our purposes in realistic distributions of galaxies, 
even if we now that they are not exact. In this way, we have validated the method, and we 
can confidently use it in real galaxy catalogues, as we do below. 

4.2.1 Robustness with respect to the fiducial cosmological model 

We also used the Las Damas mocks to assess how our results may depend on the assumed 
fiducial cosmology. We repeated the analysis for the 120 Las Damas mocks using three 
different cosmological models (in addition to the original one used above). We first tested 
the effect of changing the fiducial value of as alone. In this case, we kept all the rest of the 
cosmological parameters fixed to the true Las Damas cosmology, while changing the amplitude 
of the fiducial power spectrum. This results in a change of the covariance matrix C m used 
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Figure 7. Average maximum-likelihood constraints on b, erg, for the 120 Las Damas mock reali¬ 
sations, when assuming different fiducial cosmological models in the analysis. The black point and 
contour correpond to the true Las Damas cosmology and are the same as shown in the right panel of 
Figure 6. The blue and magenta colours correspond to the case in which we use the same Las Damas 
cosmology, except for the fiducial value of Ug. The green colour corresponds to the analysis performed 
assuming the Planck-2015 cosmological parameters from [33]. 


in the analysis (see equation 2.15). We studied two cases, cr| d = 0.7,0.9, which encompass in 
excess the range of plausible values given our present knowledge [see, e.g., 33]. 

As a second test, we considered an overall change in all the cosmological parameters, and 
used the set of values compatible with the Planck-2015 results that we use to analyse the real 
data. In addition to erg, the main parameter potentially affecting our analysis is f l m , which 
changes from VL m = 0.25 in the Las Damas cosmology to = 0.308 in the Planck-2015 
model. This change in parameters not only affects the covariance matrix C m used for the 
parameter estimation, but also the density held A g itself (as we change the relation between 
redshifts and distances). For this test, therefore, we repeat the estimation of A g with the new 
cosmology, following section 3.3. 

We summarize our results (for the case in which we fit for both b and as) in Figure 7 and 
Table 1. We see that, in all cases, the changes in the estimations of b, ag are much smaller 
than the uncertainties. In the case of the galaxy bias, the maximum relative change with 
respect to the default analysis is 0.4%, while the relative error of the measurement is 5.8%. 
Regarding the recovered value of ag, this relative change is 1.4%, compared to a relative error 
of 8.6%. The Fisher matrix estimation of the covariance matrix remains essentially unchanged 
for the different tested cosmologies. We can conclude, therefore, that our method is robust 
with respect to plausible changes in the fiducial cosmological model used in the analysis. 

5 Results for SDSS data 

We used the N-pdf method presented in section 2 to estimate the galaxy bias of the M r < — 20 
sample and as using the galaxy density held A g obtained from SDSS data as described 
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Fiducial cosmology 

b 

af ish 

b 

dg 

cr^ ish 

<78 

Las Damas 

Las Damas, erg = 0.7 
Las Damas, erg = 0.9 
Planck-2015 

1.214 ±0.070 

1.219 ±0.071 
1.216 ±0.071 

1.216 ±0.072 

0.079 ± 0.015 

0.079 ±0.015 
0.079 ±0.015 

0.080 ± 0.016 

0.805 ±0.069 

0.813 ±0.070 

0.802 ±0.069 

0.816 ±0.069 

0.074 ±0.005 

0.075 ± 0.005 
0.074 ±0.005 

0.075 ± 0.005 


Table 1. Results obtained for the maximum-likelihood estimates of the parameters (b, cf 8 ) and 
their Fisher-matrix uncertainties (tr^, cr<f 8 ), in the 120 Las Damas mocks, when using different fiducial 
cosmologies in the analysis, as explained in the text. In each case, we give the mean value and 
dispersion over the 120 realisations. 



> i--1-—i-1- ■ i- 
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Figure 8. Posterior probability distribution of the bias parameter obtained by analysing the SDSS 
catalogue described in section 3.1, for the case in which <jg is fixed at its fiducial value, <rf d = 0.8149. 
The solid vertical line marks the maximum-likelihood estimate b , and the dashed lines correspond to 
the Id interval estimated from the Fisher matrix analysis. 


in section 3.1. We first consider the case in which erg is fixed at its fiducial value (here, 
<7g d = 0.8149), and fit only for the galaxy bias. The posterior pdf of the bias in this case 
is shown in Figure 8, and the maximum-likelihood estimate obtained from the analysis is 
b = 1.238 ± 0.028. The mean bias is b = 1.240. 

This result can be compared to the bias estimated for the same SDSS sample by [11], 
using the projected correlation function w p (r p ). They obtained a value of b = 1.20 ± 0.01 
using HOD modelling, and b = 1.17±0.07 from the ratio of the galaxy and matter correlation 
functions in the range r p 6 [4, 30] h~ l Mpc (see their figure 7). 6 There is a small discrepancy 
between these measurements and our results at only the 1 — 2 a level. This difference may 
just be due to the different methods used. As the range of scales used is also different (in our 
case, the scales probed are larger than the size of the cells, and therefore > 30 /i -1 Mpc), this 

c These measurements correspond to b = 1.18 ± 0.01 and b = 1.15 ± 0.07, respectively, when we make the 
transformation from the fiducial value of erg used in [11] to the one used here. 
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Figure 9. Joint posterior probability distribution of the two parameters b , erg obtained by analysing 
the SDSS catalogue. The top and right sub-panels correspond to the marginalized probability distri¬ 
bution for each of the parameters separately. The green dot and dashed contour show the maximum- 
likelihood estimate of the parameters, and the la confidence region obtained from the Fisher matrix 
analysis. The red horizontal band corresponds to the ler region from the measurement of as by 
Planck-2015 [33]. We see that our constraints on <7g are fully compatible with the ones coming from 
CMB analysis. 


could also be related to a scale dependent bias [59-62], 

In Figure 9 we show the 2D joint posterior pdf for b and as for the case in which we 
fit for both parameters. The marginal posterior distributions are also shown at the top and 
right sides of the figure, respectively. The figure shows that we can indeed disentangle the 
constraints on the matter power spectrum from those on galaxy biasing. The corresponding 
maximum-likelihood estimates are b = 1.193 ± 0.074, <7$ = 0.862 ± 0.080. From the Fisher 
matrix estimate of the covariance matrix, we derive the correlation coefficient between the 
parameters as r = Cl2 = —0.937 

a b (J < f S 

The bias estimate obtained from this analysis has a larger uncertainty than in the previ¬ 
ous case (when as was fixed). However, it has the advantage of being an ‘absolute’ measure¬ 
ment, in the sense that it does not depend on choosing a particular value of as as is needed 
when using the usual 2-point clustering statistics. 

From this analysis we also measure the amplitude of the matter power spectrum at the 
redshift of the survey (^ me d = 0.083), which we express in terms of as- As shown in Figure 9, 
our measurement is in very good agreement with the value determined from Planck-2015 
data, <7g = 0.8149 ± 0.0093 [33]. We measure as with a precision of 9%, which is far from the 
~ 1% precision that can be achieved from CMB measurements. However, this measurement 


- 18 - 











Criterion 


AIC(Rq) - 
BIC(Rq) - 

GLRT: log 

AIC (Hi) 

BIC (Hi) 

(fe) at ^ = 1 

12.6 

8.22 

7.29 


Table 2. Results of different model selection criteria (from top to bottom: Akaike information 
criterion , Bayesian information criterion and generalized likelihood ratio test ) applied to the null 
hypothesis Hq (bias is fixed at b = 1, only og is allowed to vary), and the alternative hypothesis H\ 
(both bias b and as are allowed to vary). The different criteria provide either ‘substantial’ or ‘strong’ 
evidence in favour of the biased model (Hi). 


can be complementary as it is done using a different method at a very different redshift. It 
can be therefore used to test the consistency of the cosmological model, in a similar way as 
other low redshift estimates of a%, e.g. via lensing [63, 64], 

5.1 Model selection 

We have shown that the N-pdf method breaks the degeneracy between the bias and the power 
spectrum amplitude using the shape of the distribution: it is lognormal for the matter density 
field, but differs from it (as described by equation 2.10) for the galaxy density field. We can 
assess how effective the method is to make this distinction using the model selection criteria 
described in section 2.3. 

We compare two models. The alternative hypothesis H\ is the model used elsewhere in 
the paper, in which both parameters (b, a$) are left free, and the null hypothesis Ho is a model 
in which the value of galaxy bias is kept fixed at b = 1, and as is the only free parameter. 
Hypothesis Hq then corresponds to the case in which the N-pdf of the galaxy density field is 
lognormal (as that of matter), even if the clustering amplitude is allowed to change via as- 
We note that bias measurements relying on two-point statistics as the projected correlation 
function are unable to distinguish between these two models. 

The results obtained, for the three criteria AIC, BIC and GLRT are presented in 
Table 2. The best-fit parameters obtained for hypothesis H\ were presented in the previous 
section, in the case of Hq we obtain ds = 1.144 ± 0.039. As the AIC and BIC are approx¬ 
imations to the Bayes factors, we can use here the usual Jeffreys scale [65, 66] to assess the 
evidence of both models. According to that scale, the evidence against Hq is ‘strong’ in the 
case of the AIC, and ‘substantial’ for the BIC (this difference comes from the fact that the 
BIC penalizes more heavily the addition of extra parameters). The GLRT also favours H\ 
in a similar way. We can conclude from this test that the N-pdf of the galaxy density field 
does not follow a lognormal distribution, and it is better described by the biased model given 
by equation (2.10). 

6 Conclusions and discussions 

We have presented a full description of the N-probability density function of the galaxy 
number density fluctuations. The method follows the common assumption [e.g. 42, 43] that 
dark matter density fluctuations follow a local non-linear transformation of the initial energy 
density perturbations. The N-pdf of the galaxy number density fluctuations is given in terms 
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of the galaxy bias parameter and the cold dark matter correlations, parameterized, in our 
case, by its amplitude (os). 

The N-pdf provides us the most complete tool to perform statistical analyses, in par¬ 
ticular parameter estimation and model selection. Regarding the former, optimal parameter 
estimation can be performed via the maximum-likelihood, since the N-pdf of the galaxy num¬ 
ber density held, seen as a function of the bias and the as parameters, is nothing but the 
likelihood of the data (i.e., the galaxy number density realization) given these parameters. 
Even more, Bayesian inference can be also performed if any suitable information about the 
bias parameter is available in the form of a prior. 

In relation to model selection, we have explored some well known criteria based on the 
likelihood (notice that, in the case of known priors, other approaches as Bayesian evidence, 
could be followed): the Akaike information criterion, the Bayesian information criterion, and 
the generalized likelihood ratio test (GLRT). 

The methodology has been tested with SDSS-like simulations, both ideal log-normal 
realizations and mocks derived from the Las Damas project, showing, in both cases, that the 
maximum-likelihood estimates of the galaxy bias and the dark matter correlation amplitude 
are unbiased. We have applied our formalism to the 7th release of the SDSS main sample 
[50], for a volume-limited subset with magnitude M r < —20. We obtain b = 1.193T0.074 and 
<78 = 0.862 ± 0.080, for galaxy number density fluctuations in cells of the size of 30/i^ 1 Mpc. 
The b and ds errors are obtained from the Fisher matrix. These are in good agreement with 
alternative estimates, as the mean bias and amplitude derived from the N-pdf. 

The three model selection criteria mentioned above show that the alternative hypothesis 
(H\ of a galaxy bias parameter b given by the maximum-likelihood estimator) is favoured 
with respect to a no-biasing scenario given by the null hypothesis Ho of b = 1. 

Finally, we want to remark that our model assumes a constant bias over space, however 
the formalism can be generalized for a spatially varying bias. This is a more realistic situation, 
since it is well known that galaxy bias is scale-dependent and it evolves with redshift. This 
generalization is in progress, and will be addressed in a future paper. 
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